Baseline: practical control variates for agent evaluation in zero-sum domains

نویسندگان

  • Joshua Davidson
  • Christopher Archibald
  • Michael H. Bowling
چکیده

Agent evaluation in stochastic domains can be difficult. The commonplace approach of Monte Carlo evaluation can involve a prohibitive number of simulations when the variance of the outcome is high. In such domains, variance reduction techniques are necessary, but these techniques require careful encoding of domain knowledge. This paper introduces baseline as a simple approach to creating low variance estimators for zero-sum multi-agent domains with high outcome variance. The baseline method leverages the self play of any available agent to produce a control variate for variance reduction, subverting any extra complexity inherent with traditional approaches. The baseline method is also applicable in situations where existing techniques either require extensive implementation overhead or simply cannot be applied. Experimental variance reduction results are shown for both cases using the baseline method. Baseline is shown to surpass state-of-the-art techniques in three-player computer poker and is competitive in two-player computer poker games. Baseline also shows variance reduction in human poker and in a mock Ad Auction tournament from the Trading Agent Competition, domains where variance reduction methods are not typically employed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Alberta A GENERAL FRAMEWORK FOR REDUCING VARIANCE IN AGENT EVALUATION by

In this work, we present a unified, general approach to variance reduction in agent evaluation using machine learning to minimize variance. Evaluating an agent’s performance in a stochastic setting is necessary for agent development, scientific evaluation, and competitions. Traditionally, evaluation is done using Monte Carlo estimation (sample averages); the magnitude of the stochasticity in th...

متن کامل

The Baseline Approach to Agent Evaluation

Efficient, unbiased estimation of agent performance is essential for drawing statistically significant conclusions in multi-agent domains with high outcome variance. Näıve Monte Carlo estimation is often insufficient, as it can require a prohibitive number of samples, especially when evaluating slow-acting agents. Classical variance reduction techniques typically require careful encoding of dom...

متن کامل

Investigating the effectiveness of Variance Reduction Techniques in Manufacturing, Call Center and Cross-docking Discrete Event Simulation Models

Variance reduction techniques have been shown by others in the past to be a useful tool to reduce variance in Simulation studies. However, their application and success in the past has been mainly domain specific, with relatively little guidelines as to their general applicability, in particular for novices in this area. To facilitate their use, this study aims to investigate the robustness of ...

متن کامل

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on...

متن کامل

Control Variates for Stochastic Gradient MCMC

It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. We compare the performance of two classes of methods which aim to solve this issue: stochastic gradient MCMC (SGMCMC), and divide and conquer methods. We find an SGMCMC method, stochastic gradient Langevin dynamics (SGLD) to be the most robust in these comparisons. This method makes use of a noisy esti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013